Skip to content

Conversation

@UnpureRationalist
Copy link

Hi, I add a vectordb benchmark script using SIFT1M dataset. The dataset is available at http://corpus-texmex.irisa.fr/. Download it and unzip at path build/sift1M/*..fvecs, compile the script with command make vectordb-bench and run it with command ./bin/bustub-vectordb-bench, then we can get result similar to:

$ ./bin/bustub-vectordb-bench
[0.002 s] Creating vector index...
[0.002 s] Loading database
[0.286 s] Loading database, size 1000000*128
[0.286 s] Loading database, #0  #1000000
[0.500 s] Loading database, #1000  #1000000
[0.946 s] Loading database, #2000  #1000000
[1.504 s] Loading database, #3000  #1000000
[2.387 s] Loading database, #4000  #1000000
[3.333 s] Loading database, #5000  #1000000
[4.131 s] Loading database, #6000  #1000000
...
[16.443 s] Doing query, #6000  #10000
[17.930 s] Doing query, #7000  #10000
[19.434 s] Doing query, #8000  #10000
[21.172 s] Doing query, #9000  #10000
[22.887 s] Compute recalls
R@1 = 0.0143
R@10 = 0.0143
R@100 = 0.0143

@skyzh
Copy link
Owner

skyzh commented Oct 8, 2024

Thanks a lot and this is a huge improvement to the course! I’ll review once I’m back from my vacation 😍

@UnpureRationalist
Copy link
Author

Thank you! BTW, running this script may cause memory errors, which is mentioned in cmu-db#716 and we need to modify the source code to fix this bug. Enjoy your vacation!🌴🏖️

@skyzh skyzh self-requested a review November 10, 2024 20:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants